[Content Understanding] Update toLlmInput page markers and filter LLMStats telemetry#38851
Draft
chienyuanchang wants to merge 2 commits into
Draft
[Content Understanding] Update toLlmInput page markers and filter LLMStats telemetry#38851chienyuanchang wants to merge 2 commits into
chienyuanchang wants to merge 2 commits into
Conversation
pitbull231980-dotcom
approved these changes
Jun 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Packages impacted by this PR
@azure/ai-content-understandingIssues associated with this PR
LLMStats:filtering: Python: Adopt azure-ai-contentunderstandingto_llm_inputin CU context provider microsoft/agent-framework#5796Describe the problem that is addressed by this PR
The
toLlmInput()helper renders Content UnderstandingAnalysisResultobjects into LLM-friendly text. Two output-hygiene issues need to be addressed before the next CU service release:<!-- page N -->. The upcoming service release (per ContentUnderstanding-Docs#249) will emit the same boundary using<!-- InputPageNumber: N -->. The SDK should adopt the new format and avoid emitting duplicate markers when the service-supplied markdown already contains them.LLMStats: completion calls: 2; embedding calls: 1; completion latency: 7.71s) in thewarningscollection. These are not Responsible-AI warnings, and downstream consumers (Agent Framework, LangChain) currently strip them with local regex workarounds. The SDK should filter them at the source so the noise never reaches the LLM-facingrai_warningsblock.What are the possible designs available to address the problem? If there are more than one possible design, why was the one in this PR chosen?
This PR makes the smallest possible surface change inside
toLlmInput():INPUT_PAGE_MARKER_PREFIXconstant and ahasInputPageMarker()check at the top ofaddPageMarkers(). If the markdown already includes any<!-- InputPageNumber:substring (case-sensitive), pass the markdown through unchanged. Otherwise inject the new-format marker via the existing spans / PageBreak paths.TELEMETRY_MESSAGE_PREFIXES = ["LLMStats:"]list and a smallisTelemetryMessage()predicate. InsideformatWarnings(), skip entries whosemessage(after trimming leading whitespace) starts with any prefix. Filtering is scoped to the structured warnings list only; the document markdown body is never inspected, so legitimateLLMStats:text in documents is preserved.Alternative considered: post-rendering regex on the YAML output (the workaround currently used by Agent Framework). Rejected because operating on the structured list before rendering is simpler, more robust to YAML escaping, and idempotent.
Are there test cases added in this PR? (If not, why?)
Yes. Updated existing tests for the new marker format and added six new unit tests:
LLMStats:warnings dropped while real warnings are kept.rai_warningsblock omitted entirely when onlyLLMStats:warnings exist.llmstats:is preserved).LLMStats:text is preserved verbatim.LLMStats:warnings are filtered.All 37 unit tests in
test/public/node/llmInputHelper.spec.tspass locally.Provide a list of related PRs (if any)
Companion PRs in sibling SDKs:
Command used to generate this PR:**(Applicable only to SDK release request PRs)
Not applicable. This PR modifies hand-authored helper code; no regeneration was performed.
Checklists
src/static-helpers/llmInputHelper.ts(not generated).CHANGELOG.mdupdated under1.2.0-beta.2 (Unreleased).